Exploratory Data Analysis¶
Importing processed data¶
| date | adj_close | close | high | low | open | volume | ticker | revenues | cost_of_goods | ... | current_ratio | debt_to_equity_ratio | ebitda_margin | gross_margin | net_income_margin | dividend_yield | payout_ratio | return_on_assets | return_on_equity | return_on_capital | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2010-04-01 | 7.108997 | 8.427500 | 8.526071 | 8.312500 | 8.478929 | 603145200 | AAPL | 13499000.0 | 7874000.0 | ... | 2.644206 | 0.450061 | 0.314468 | 15.785443 | 0.227721 | 0.0 | 0.0 | 0.054664 | 0.078123 | 0.089877 |
| 1 | 2010-04-05 | 7.184915 | 8.517500 | 8.518214 | 8.384643 | 8.392143 | 684507600 | AAPL | 13499000.0 | 7874000.0 | ... | 2.644206 | 0.450061 | 0.314468 | 15.785443 | 0.227721 | 0.0 | 0.0 | 0.054664 | 0.078123 | 0.089877 |
| 2 | 2010-04-06 | 7.216546 | 8.555000 | 8.580000 | 8.464286 | 8.507143 | 447017200 | AAPL | 13499000.0 | 7874000.0 | ... | 2.644206 | 0.450061 | 0.314468 | 15.785443 | 0.227721 | 0.0 | 0.0 | 0.054664 | 0.078123 | 0.089877 |
| 3 | 2010-04-07 | 7.248483 | 8.592857 | 8.640000 | 8.523571 | 8.555357 | 628502000 | AAPL | 13499000.0 | 7874000.0 | ... | 2.644206 | 0.450061 | 0.314468 | 15.785443 | 0.227721 | 0.0 | 0.0 | 0.054664 | 0.078123 | 0.089877 |
| 4 | 2010-04-08 | 7.228898 | 8.569643 | 8.626429 | 8.501429 | 8.587143 | 572989200 | AAPL | 13499000.0 | 7874000.0 | ... | 2.644206 | 0.450061 | 0.314468 | 15.785443 | 0.227721 | 0.0 | 0.0 | 0.054664 | 0.078123 | 0.089877 |
5 rows × 106 columns
Plot close price¶
This code is used to visualize the historical closing prices of Apple Inc. (AAPL) stock over time. It first converts the date column in the DataFrame to datetime objects, ensuring proper handling of time-based data. Then, a line plot is generated with the date on the x-axis and the closing prices (close) on the y-axis. The plot is labeled with "date" for the x-axis and "Close Price" for the y-axis, and the title "AAPL Close Price" is added to the chart. Finally, the plot is displayed to help identify trends, fluctuations, and patterns in the stock's performance over the selected time period.
Normalized Net Income¶
Normalize 'Close' and 'Net Income' Columns
- The
closeandnet_incomecolumns are normalized using Min-Max scaling, which transforms the values to a range between 0 and 1:normalized_close: Scales thecloseprices.normalized_net_income: Scales thenet_incomevalues.
- The
Plot Normalized Data
- A plot is created with the
dateon the x-axis and the normalized values (normalized_closeandnormalized_net_income) on the y-axis. - The line for
normalized_closeis labeled as "Normalized Close Price," and the line fornormalized_net_incomeis labeled as "Normalized Net Income."
- A plot is created with the
Add Labels and Title
- The x-axis is labeled as "Date," and the y-axis is labeled as "Normalized Value."
- The title of the plot is set to "Normalized Close Price vs. Normalized Net Income."
Add Grid and Legend
- A grid is added for better readability, and a legend is included to differentiate the two lines in the plot.
Display the Plot
plt.show()is called to display the final plot.
Calculate Moving Average¶
Define Moving Average Period
mavgdis set to 30, specifying the number of days for the moving average calculation.
Calculate Moving Average
- The function
calculate_moving_average()computes the moving average for the 'close' prices using a rolling window of sizewindow. It creates a new column in the DataFrame, labeled{window}_DMA, representing the moving average for the specified window.
- The function
Apply Moving Average Calculation
- The function is applied to the DataFrame
dfwith the moving average period (mavgd = 30), creating a new column called30_DMAcontaining the 30-day moving average of the 'close' price.
- The function is applied to the DataFrame
Plot Close Price and Moving Average
- A plot is created with the
dateon the x-axis and both thecloseprice and the calculated moving average (30_DMA) on the y-axis. - The plot displays two lines: one for the "Close Price" and another for the "30-Day Moving Average."
- A plot is created with the
Add Labels and Title
- The x-axis is labeled as "Date," and the y-axis is labeled as "Price."
- The title of the plot is dynamically set to "AAPL Close Price and 30-Day Moving Average."
Add Grid and Legend
- A grid is added to the plot for better readability, and a legend is included to differentiate between the "Close Price" and "30-Day Moving Average."
Display the Plot
plt.show()is called to display the final plot.
Label Generation¶
Define Moving Average Period for Label Generation
labels_moving_average_daysis set to 10, specifying the number of days for the rolling average calculation used to generate the buy/sell signals.
Calculate Moving Average
- The 10-day rolling average of the
closeprice is calculated using therolling(window=labels_moving_average_days)method, and a new column,10_Day_Avg, is added to the DataFrame to store this value.
- The 10-day rolling average of the
Shift the Moving Average
- The
10_Day_Avgcolumn is shifted down by 10 days (using.shift(-labels_moving_average_days)) to compare each day's close price with the next 10-day moving average. This shifted average is stored in a new column,Next_10_Day_Avg.
- The
Generate Buy/Sell Signals
- A new column,
signal, is created and initialized with a default value of "SELL." - A condition is applied to identify where the
closeprice is lower than theNext_10_Day_Avg. For these rows, thesignalcolumn is updated to "BUY."
- A new column,
Rational for Choosing this as target variable¶
- Difficult to predict prices in stock market due to various factors and variations.
- Wile we can predict the trend up or down for a short or long period of time based on trend like Moving Average Comparision but this are lagging indicators.
- These indicators smoothens the prices but due to lagging indicators all the action has already been done.
- If we can predict the moving average cross with current close price before hand x days then we can participate in the trend early and get some benifits.
- The model will learn to predict these intersections beforehand.
- During training, we're essentially teaching the model to recognize patterns that lead to these intersections.
This code is used to create a simple trading strategy based on comparing a stock's close price with its future moving average to generate buy and sell signals.
Plotting the target variable with closing price and moving average¶
Scatter Plot for BUY/SELL signals¶
Purpose of Buy/Sell Signals
The code generates buy and sell signals based on the comparison between the current stock price and its moving average. These signals can be used to develop a simple trading strategy:- BUY: When the current price is below the future moving average, it suggests a potential upward movement, and a "BUY" signal is generated.
- SELL: When the current price is above the future moving average, it indicates that the price might decline, and a "SELL" signal is generated.
Plotting Buy and Sell Signals
The plot visualizes these buy and sell signals on the stock's price chart:- Green Circles (BUY Signals): Represent points where the stock price is below the expected moving average, suggesting a buying opportunity.
- Red Circles (SELL Signals): Represent points where the stock price is above the expected moving average, indicating a potential selling point.
What the Plot Shows¶
- The plot displays the stock’s closing price over time, with green dots indicating where the algorithm suggests buying the stock, and red dots marking suggested sell points.
- The x-axis represents the date, showing the timeline over which these decisions were made, while the y-axis represents the close price of the stock.
- The combination of the price trend and these signals can help visualize potential entry and exit points for a trading strategy based on historical price movements and moving averages.
Find Cumulative Profit¶
The code simulates a simple trading strategy where buy and sell signals are used to track the cumulative profit of a stock position. The goal is to calculate the profit from buying and selling stocks based on a series of buy and sell signals.
Initialization:
The following variables are initialized to track the trading process:stock_quantity: Tracks how many stocks are held at any given point in time.total_buy_price: Tracks the total price spent on buying stocks.stock_profit: Tracks the profit made from each sell transaction.quantityandbuy_priceare set to 0 initially, representing no stock purchased.
Iterating Through the Data:
The code then iterates over each row of the DataFrame (df), simulating trading based on buy ('BUY') and sell ('SELL') signals.When the signal is 'BUY':
- The quantity of stocks held (
quantity) is increased by 1 (simulating the purchase of one stock). - The total buy price (
buy_price) is updated by adding the stock’s closing price. - The DataFrame is updated to reflect the current quantity of stock and total purchase price.
- The quantity of stocks held (
When the signal is 'SELL' (and there are stocks to sell):
The profit from selling is calculated as:
Profit = (quantity * current close price) - total buy price
After selling, the stock quantity and total buy price are reset to 0, and the profit for that sell transaction is recorded in the DataFrame.
When there is no buy or sell signal ('Hold' condition):
- If no action is taken (i.e., when the signal is not 'BUY' or 'SELL'), the stock holdings and total buy price remain unchanged.
Cumulative Profit Calculation:
After processing the signals, the cumulative profit is calculated usingcumsum(), which returns the running total of profits up to each point in time. The cumulative profit reflects the overall performance of the simulated trading strategy up to each date in the DataFrame.This simulation calculates the cumulative profit or loss for a strategy based on buy and sell signals.
Cumulative Profit: The final column,
cumulative_profit, tracks the total profit from all completed buy/sell transactions as the algorithm progresses through the dataset. This gives a clear picture of how much profit would have been accumulated over time based on the trading signals provided.
| date | adj_close | close | high | low | open | volume | ticker | revenues | cost_of_goods | ... | normalized_net_income | 30_DMA | 10_DMA | 10_Day_Avg | Next_10_Day_Avg | signal | stock_quantity | total_buy_price | stock_profit | cumulative_profit | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2010-04-01 | 7.108997 | 8.427500 | 8.526071 | 8.312500 | 8.478929 | 603145200 | AAPL | 13499000.0 | 7874000.0 | ... | 0.000000 | 8.427500 | 8.427500 | NaN | 8.668214 | BUY | 1.0 | 8.427500 | 0.0 | 0.000000 |
| 1 | 2010-04-05 | 7.184915 | 8.517500 | 8.518214 | 8.384643 | 8.392143 | 684507600 | AAPL | 13499000.0 | 7874000.0 | ... | 0.000000 | 8.472500 | 8.472500 | NaN | 8.698857 | BUY | 2.0 | 16.945000 | 0.0 | 0.000000 |
| 2 | 2010-04-06 | 7.216546 | 8.555000 | 8.580000 | 8.464286 | 8.507143 | 447017200 | AAPL | 13499000.0 | 7874000.0 | ... | 0.000000 | 8.500000 | 8.500000 | NaN | 8.716893 | BUY | 3.0 | 25.500000 | 0.0 | 0.000000 |
| 3 | 2010-04-07 | 7.248483 | 8.592857 | 8.640000 | 8.523571 | 8.555357 | 628502000 | AAPL | 13499000.0 | 7874000.0 | ... | 0.000000 | 8.523214 | 8.523214 | NaN | 8.783393 | BUY | 4.0 | 34.092857 | 0.0 | 0.000000 |
| 4 | 2010-04-08 | 7.228898 | 8.569643 | 8.626429 | 8.501429 | 8.587143 | 572989200 | AAPL | 13499000.0 | 7874000.0 | ... | 0.000000 | 8.532500 | 8.532500 | NaN | 8.878107 | BUY | 5.0 | 42.662500 | 0.0 | 0.000000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2702 | 2020-12-23 | 128.059906 | 130.960007 | 132.429993 | 130.779999 | 132.160004 | 88223700 | AAPL | 58313000.0 | 35943000.0 | ... | 0.426626 | 122.093333 | 126.955000 | 126.955000 | NaN | SELL | 0.0 | 0.000000 | 0.0 | 4631.666698 |
| 2703 | 2020-12-24 | 129.047501 | 131.970001 | 133.460007 | 131.100006 | 131.320007 | 54930100 | AAPL | 58313000.0 | 35943000.0 | ... | 0.426626 | 122.509333 | 127.828001 | 127.828001 | NaN | SELL | 0.0 | 0.000000 | 0.0 | 4631.666698 |
| 2704 | 2020-12-28 | 133.662994 | 136.690002 | 137.339996 | 133.509995 | 133.990005 | 124486200 | AAPL | 58313000.0 | 35943000.0 | ... | 0.426626 | 123.092000 | 129.256001 | 129.256001 | NaN | SELL | 0.0 | 0.000000 | 0.0 | 4631.666698 |
| 2705 | 2020-12-29 | 131.883286 | 134.869995 | 138.789993 | 134.339996 | 138.050003 | 121047300 | AAPL | 58313000.0 | 35943000.0 | ... | 0.426626 | 123.612333 | 130.565000 | 130.565000 | NaN | SELL | 0.0 | 0.000000 | 0.0 | 4631.666698 |
| 2706 | 2020-12-30 | 130.758759 | 133.720001 | 135.990005 | 133.399994 | 135.580002 | 96452100 | AAPL | 58313000.0 | 35943000.0 | ... | 0.426626 | 124.059666 | 131.149001 | 131.149001 | NaN | SELL | 0.0 | 0.000000 | 0.0 | 4631.666698 |
2707 rows × 117 columns
2
Number of profitable trades: 183 Number of losing trades: 2 Number of neutral trades: 2522
Total Profit: 4631.67 Average Profit per Trade: 25.04 Max Profit: 448.06 Max Loss: -0.09 Profit Standard Deviation: 17.34
Correlation Analysis¶
10_DMA 0.998379 cumulative_profit 0.957153 debt_to_equity_ratio 0.909119 price_to_book_value 0.891101 other_assets 0.880511 market_capitalization 0.868078 research_&_development 0.861709 enterprise_valuation 0.860277 common_stock 0.853777 property_plant_&_equipment 0.848398 Name: close, dtype: float64
Principal Component Analysis¶
Number of components needed for 99% variance explained: 4
((2707, 102), (2707, 103))